-
Notifications
You must be signed in to change notification settings - Fork 56
refactor: XYNE-103 Attachments are moved to file.sd schema #1065
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughCentralizes file-type detection into a shared utility; frontend ChatBox removeFile becomes async and calls a new server-side delete API while still removing files locally on failure; server adds handleAttachmentDelete + POST /api/v1/files/delete, broadens attachment-entity checks, and extends Vespa file schema with image chunking and embeddings. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Frontend as ChatBox (UI)
participant API as /api/v1/files/delete
participant FilesSvc as server/api/files.ts
participant Handler as handleAttachmentDelete
participant Disk as Disk (images)
participant Vespa as Vespa (fileSchema)
User->>Frontend: Click remove attachment
Frontend->>API: POST /api/v1/files/delete { attachment }
API->>FilesSvc: handleAttachmentDeleteApi(payload)
FilesSvc->>Handler: handleAttachmentDelete([attachment], ownerEmail)
alt attachment is image
Handler->>Disk: remove image directory/files
Handler->>Vespa: DeleteDocument(fileId) (fileSchema)
else non-image attachment
Handler->>Vespa: DeleteDocument(fileId) (fileSchema or KbItemsSchema by prefix)
end
Handler-->>FilesSvc: deletion result
FilesSvc-->>API: 200 / error
API-->>Frontend: response (Frontend still removes file locally)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Himanshvarma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request, authored by Himanshvarma, refactors the system to store and manage attachments within a dedicated Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is a significant and well-executed refactoring to move attachments to the file.sd
schema. The changes are consistent across the frontend, backend, and Vespa schema definitions. I've identified a few areas for improvement, mainly concerning error handling and code style, to enhance the robustness and readability of the new implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (2)
server/api/files.ts (2)
428-434
: Consider error propagation for DeleteImages failures.If
DeleteImages
fails but the subsequent Vespa deletion succeeds, orphaned images remain on disk. Consider whether to:
- Rethrow the error to prevent Vespa deletion
- Collect failed deletions and report them
- Continue as-is if orphaned images are acceptable
486-490
: Prefer explicit typing over 'as any'.While the
Array.isArray
check makes this safe, usingany
reduces type safety.Consider this more type-safe approach:
- const fields = attachmentDoc.fields as any - const permissions = Array.isArray(fields.permissions) ? fields.permissions as string[] : [] + const fields = attachmentDoc.fields as Record<string, unknown> + const permissions = Array.isArray(fields.permissions) + ? (fields.permissions as string[]).filter((p): p is string => typeof p === 'string') + : []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/api/files.ts
(5 hunks)server/package.json
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (7)
server/shared/types.ts (2)
attachmentFileTypeMap
(210-218)AttachmentMetadata
(707-707)server/shared/fileUtils.ts (2)
getFileType
(10-32)isImageFile
(4-8)server/integrations/dataSource/config.ts (2)
getBaseMimeType
(74-76)isImageFile
(89-92)server/search/vespa.ts (3)
insert
(20-20)DeleteDocument
(24-24)GetDocument
(21-21)server/utils.ts (1)
getErrorMessage
(103-106)server/api/dataSource.ts (1)
DeleteImages
(235-261)server/api/search.ts (1)
handleAttachmentDeleteSchema
(259-261)
🔇 Additional comments (6)
server/api/files.ts (6)
8-8
: LGTM!Import additions support the new attachment deletion and file schema migration functionality.
Also applies to: 15-15, 18-18, 23-23, 25-25, 27-27, 30-32
384-393
: LGTM!Path traversal validation prevents security vulnerabilities by rejecting fileIds with directory traversal characters.
436-440
: LGTM!Backward compatibility logic elegantly handles both old (
att_
) and new (attf_
) fileId prefixes during the migration period.
487-490
: LGTM!The permission check correctly handles the case where
fields.permissions
might not be an array, addressing the previous review concern.
266-266
: ProcessFile return value includes all destructured fields The static async processFile method in server/services/fileProcessor.ts returns chunks, chunks_pos, image_chunks, and image_chunks_pos as expected.
230-230
: No legacyatt_
prefix found outside tests
Non-test code contains no hardcodedatt_
prefixes andhandleAttachmentDelete
already handles bothatt_
andattf_
. Tests and example comments still use the old prefix—a cosmetic concern only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/api/files.ts (1)
272-301
: Sheet ingestion bug: using fileId/title instead of per‑sheet docId/fileName.For spreadsheets, you compute docId and fileName per processingResult but insert with docId: fileId and title: file.name. This will overwrite sheets or create duplicate IDs.
Apply this diff:
- const vespaDoc = { - title: file.name, + const vespaDoc = { + title: fileName, url: "", app: Apps.Attachment, - docId: fileId, + docId: docId, parentId: null, owner: email, photoLink: "", ownerEmail: email, - entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })], + entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })], chunks: chunks, chunks_pos: chunks_pos, image_chunks: image_chunks, image_chunks_pos: image_chunks_pos, chunks_map: processingResult.chunks_map, image_chunks_map: processingResult.image_chunks_map, permissions: [email], mimeType: getBaseMimeType(file.type || "text/plain"), metadata: JSON.stringify({ - originalFileName: file.name, + originalFileName: fileName, uploadedBy: email, chunksCount: chunks.length, imageChunksCount: image_chunks.length, processingMethod: getBaseMimeType(file.type || "text/plain"), lastModified: Date.now(), ...(('sheetName' in processingResult) && { sheetName: (processingResult as SheetProcessingResult).sheetName, sheetIndex: (processingResult as SheetProcessingResult).sheetIndex, totalSheets: (processingResult as SheetProcessingResult).totalSheets, }), }), createdAt: Date.now(), updatedAt: Date.now(), }
♻️ Duplicate comments (3)
server/package.json (1)
81-81
: Blocking:@xyne/vespa-ts@1.1.2
appears unpublished; installs will fail unless a private registry serves it.If your
@xyne
scope isn’t pointed to a private registry hosting 1.1.2, please either publish 1.1.2, downgrade to a published version, or reference a git commit.Run to verify registry setup and availability:
#!/bin/bash set -euo pipefail echo "== Check for scoped registry config (.npmrc) ==" fd -H '.npmrc' -x sh -c 'echo "--- {} ---"; cat {}' || true echo echo "== Look for publishConfig in package.json files ==" fd package.json -t f -x jq -r 'select(.publishConfig!=null) | .name + " -> " + (.publishConfig.registry // "no registry")' || true echo echo "== Check lockfiles referencing @xyne/vespa-ts ==" fd -H 'bun.lockb|package-lock.json|pnpm-lock.yaml|yarn.lock' -E 'node_modules' -x sh -c 'echo "--- {} ---"; rg -n "@xyne/vespa-ts" {} || true' || true echo echo "== Public npm versions for @xyne/vespa-ts (if any) ==" curl -fsS https://registry.npmjs.org/%40xyne%2Fvespa-ts | jq -r '.["dist-tags"], (.versions|keys|sort|join(", "))' || echo "Package not visible on public npm"If you need a quick unblock and 1.0.4 suffices, change to a published version:
- "@xyne/vespa-ts": "1.1.2", + "@xyne/vespa-ts": "1.0.4",If 1.1.2 is required and not yet published, consider a temporary git reference:
- "@xyne/vespa-ts": "1.1.2", + "@xyne/vespa-ts": "git+ssh://git@github.com/xynehq/vespa-ts.git#<commit-sha>",server/api/files.ts (2)
297-303
: Optional: add defensive fallback for entity mapping.getFileType defaults to FileType.FILE and attachmentFileTypeMap covers FILE, so this is safe today. Adding a fallback guards future changes.
- entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })], + entity: + attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })] ?? + AttachmentEntity.File,
505-536
: Unauthorized deletion risk for images; enforce permission/ownership.Image attachments are deleted without verifying requester’s access. A caller can supply any fileId and set isImage=true.
Suggested approach:
- Validate that the image attachment belongs to the user before deletion by checking your attachments store (e.g., a lookup by fileId and email in the attachments table), or persist image metadata in Vespa and use the same permissions check as non‑images.
- Alternatively, require messageId in the payload and verify the attachment exists on that message for the user.
Minimal patch sketch:
- if(isImage) { - await handleAttachmentDelete([attachment], email) + if (isImage) { + // Verify ownership via attachments metadata + const owned = await findAttachmentByFileIdAndEmail(fileId, email) // implement in db/attachment + if (!owned) { + throw new HTTPException(403, { message: "Access denied to this attachment" }) + } + await handleAttachmentDelete([attachment], email) return c.json({ success: true, message: "Attachment deleted successfully" }) }If a DB helper doesn’t exist yet, I can draft it. Do you want me to wire this up?
🧹 Nitpick comments (2)
server/api/chat/chat.ts (1)
511-516
: Broadened attachment citation skip is correct; consider a small perf tweak.The Object.values(AttachmentEntity).includes(...) check runs in a tight streaming loop. Precompute a Set once and use .has().
Apply within this block:
- Object.values(AttachmentEntity).includes(f?.entity) + attachmentEntities.has(f?.entity)Add near imports:
const attachmentEntities = new Set(Object.values(AttachmentEntity))server/api/files.ts (1)
384-492
: Centralized deletion helper is good; add small safety improvements.
- Deduplicate fileIds to avoid repeated I/O/calls on duplicate metadata.
- Consider short‑circuiting when arrays are empty to skip logging.
Example:
- const imageAttachmentFileIds: string[] = [] - const nonImageAttachmentFileIds: string[] = [] + const imageAttachmentFileIds = new Set<string>() + const nonImageAttachmentFileIds = new Set<string>() ... - imageAttachmentFileIds.push(attachment.fileId) + imageAttachmentFileIds.add(attachment.fileId) ... - nonImageAttachmentFileIds.push(attachment.fileId) + nonImageAttachmentFileIds.add(attachment.fileId) ... - if (imageAttachmentFileIds.length > 0) { + if (imageAttachmentFileIds.size > 0) { ... - for (const fileId of imageAttachmentFileIds) { + for (const fileId of imageAttachmentFileIds) { ... - if (nonImageAttachmentFileIds.length > 0) { + if (nonImageAttachmentFileIds.size > 0) { ... - for (const fileId of nonImageAttachmentFileIds) { + for (const fileId of nonImageAttachmentFileIds) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
server/api/chat/agents.ts
(2 hunks)server/api/chat/chat.ts
(4 hunks)server/api/files.ts
(6 hunks)server/package.json
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/api/chat/agents.ts
🧰 Additional context used
🧬 Code graph analysis (2)
server/api/files.ts (8)
server/shared/types.ts (2)
attachmentFileTypeMap
(210-218)AttachmentMetadata
(707-707)server/shared/fileUtils.ts (2)
getFileType
(10-32)isImageFile
(4-8)server/integrations/dataSource/config.ts (2)
getBaseMimeType
(69-71)isImageFile
(84-87)server/search/vespa.ts (3)
insert
(20-20)DeleteDocument
(24-24)GetDocument
(21-21)server/utils.ts (1)
getErrorMessage
(103-106)server/api/chat/chat.ts (1)
expandSheetIds
(223-242)server/api/dataSource.ts (1)
DeleteImages
(235-261)server/api/search.ts (1)
handleAttachmentDeleteSchema
(259-261)
server/api/chat/chat.ts (3)
server/shared/types.ts (2)
AttachmentEntity
(39-39)AttachmentMetadata
(707-707)server/db/message.ts (1)
getChatMessagesWithAuth
(47-58)server/api/files.ts (1)
handleAttachmentDelete
(384-492)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
server/integrations/google/sync.ts (1)
372-380
: Good refactor to avoid mutation, but consider making the array path consistent.The change correctly avoids mutating
vespaData.permissions
in place by creating a new object with the spread operator. However, the array path (lines 362-370) still mutatesv.permissions
within the map callback, creating an inconsistency.Consider updating the array path for consistency:
if (Array.isArray(vespaData)) { - let allData: VespaFileWithDrivePermission[] = [...vespaData].map( - (v) => { - v.permissions = toPermissionsList(v.permissions, email) - return v - }, - ) + let allData: VespaFileWithDrivePermission[] = vespaData.map((v) => ({ + ...v, + permissions: toPermissionsList(v.permissions, email), + })) for (const data of allData) { await insertWithRetry(data, fileSchema) }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/integrations/google/sync.ts
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/integrations/google/sync.ts (2)
server/integrations/google/utils.ts (1)
toPermissionsList
(355-381)server/search/vespa.ts (1)
insertWithRetry
(167-167)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/api/chat/chat.ts (1)
791-820
: Attachments still not collected for deletion
getChatMessagesWithAuth
returnsSelectMessage
objects without anattachments
property, soattachmentsToDelete
stays empty and uploaded files remain orphaned. Collect the metadata per message inside the transaction (falling back togetAttachmentsByMessageId
) before callinghandleAttachmentDelete
after commit.@@ - const attachmentsToDelete: AttachmentMetadata[] = [] + const attachmentsToDelete: AttachmentMetadata[] = [] @@ - for (const message of messagesToDelete) { - if (message.attachments && Array.isArray(message.attachments)) { - const attachments = message.attachments as AttachmentMetadata[] - attachmentsToDelete.push(...attachments) - } - } + for (const message of messagesToDelete) { + const attachments = + Array.isArray((message as any).attachments) + ? ((message as any).attachments as AttachmentMetadata[]) + : await getAttachmentsByMessageId(tx, message.externalId, email) + if (attachments?.length) { + attachmentsToDelete.push(...attachments) + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/api/chat/chat.ts
(5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/chat/chat.ts (4)
server/shared/types.ts (1)
AttachmentMetadata
(707-707)server/db/chat.ts (1)
getChatByExternalIdWithAuth
(70-83)server/db/message.ts (1)
getChatMessagesWithAuth
(47-58)server/api/files.ts (1)
handleAttachmentDelete
(384-492)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
server/api/files.ts (1)
293-301
: Add fallback for entity field.The
entity
field at line 301 could beundefined
ifgetFileType
returns aFileType
not present inattachmentFileTypeMap
. This was flagged in a previous review and remains unaddressed.Apply this diff to add a safe fallback:
+ const fileType = getFileType({ type: file.type, name: file.name }) const vespaDoc = { title: file.name, url: "", app: Apps.Attachment, docId: docId, parentId: null, owner: email, photoLink: "", ownerEmail: email, - entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })], + entity: attachmentFileTypeMap[fileType] ?? AttachmentEntity.File,This ensures a valid entity is always provided, preventing potential Vespa ingestion failures.
🧹 Nitpick comments (1)
server/api/files.ts (1)
18-18
: Remove unused GetDocument import
TheGetDocument
import on line 18 isn’t used anywhere in this file; please remove it.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
server/api/chat/agents.ts
(3 hunks)server/api/chat/chat.ts
(5 hunks)server/api/files.ts
(6 hunks)server/api/knowledgeBase.ts
(1 hunks)server/search/utils.ts
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- server/api/chat/agents.ts
- server/api/chat/chat.ts
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (7)
server/shared/types.ts (2)
attachmentFileTypeMap
(210-218)AttachmentMetadata
(707-707)server/shared/fileUtils.ts (2)
getFileType
(10-32)isImageFile
(4-8)server/search/vespa.ts (2)
insert
(20-20)DeleteDocument
(24-24)server/utils.ts (1)
getErrorMessage
(103-106)server/search/utils.ts (1)
expandSheetIds
(23-42)server/api/dataSource.ts (1)
DeleteImages
(235-261)server/api/search.ts (1)
handleAttachmentDeleteSchema
(259-261)
🔇 Additional comments (1)
server/api/knowledgeBase.ts (1)
61-61
: LGTM! Clean import refactor.The import path update centralizes
expandSheetIds
in a shared utility module, improving code organization and reusability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/api/files.ts
(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (8)
server/shared/types.ts (2)
attachmentFileTypeMap
(210-218)AttachmentMetadata
(707-707)server/shared/fileUtils.ts (2)
getFileType
(10-32)isImageFile
(4-8)server/integrations/dataSource/config.ts (2)
getBaseMimeType
(69-71)isImageFile
(84-87)server/search/vespa.ts (3)
insert
(20-20)DeleteDocument
(24-24)GetDocument
(21-21)server/utils.ts (1)
getErrorMessage
(103-106)server/search/utils.ts (1)
expandSheetIds
(23-42)server/api/dataSource.ts (1)
DeleteImages
(235-261)server/api/search.ts (1)
handleAttachmentDeleteSchema
(259-261)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (1)
server/api/files.ts (1)
264-264
: Add fallback for unmapped file types.If
getFileType
returns aFileType
not present inattachmentFileTypeMap
, the entity field will beundefined
, which could cause downstream issues in Vespa or search logic.Apply this defensive pattern at both locations (lines 264 and 326):
- entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })], + entity: attachmentFileTypeMap[getFileType({ type: file.type, name: file.name })] ?? AttachmentEntity.File,Also applies to: 326-326
🧹 Nitpick comments (1)
server/api/files.ts (1)
409-513
: Deletion function lacks internal permission validation.While
handleAttachmentDeleteApi
validates permissions before calling this function,handleAttachmentDelete
is also called directly fromchat.ts
(ChatDeleteApi). In that flow, chat ownership is verified before collecting attachments, providing implicit authorization. However, as a reusable utility function, it would be safer to include permission validation insidehandleAttachmentDelete
itself to prevent misuse if called from new locations in the future.Consider adding an optional permission check inside
handleAttachmentDelete
:-export const handleAttachmentDelete = async (attachments: AttachmentMetadata [], email: string) => { +export const handleAttachmentDelete = async ( + attachments: AttachmentMetadata [], + email: string, + skipPermissionCheck = false +) => { + if (!skipPermissionCheck) { + // Verify permissions for each attachment before deletion + for (const attachment of attachments) { + if (!attachment.fileId) continue + const vespaIds = expandSheetIds(attachment.fileId) + for (const vespaId of vespaIds) { + try { + const schema = vespaId.startsWith("att_") ? KbItemsSchema : fileSchema + const doc = await GetDocument(schema, vespaId) + if (doc?.fields?.permissions && !doc.fields.permissions.includes(email)) { + loggerWithChild({ email }).warn( + `User ${email} attempted to delete attachment ${vespaId} without permission` + ) + throw new HTTPException(403, { message: "Access denied" }) + } + } catch (error) { + if (error instanceof HTTPException) throw error + // Document not found, allow deletion to proceed + } + } + } + } const imageAttachmentFileIds: string[] = []Then call it with
skipPermissionCheck: true
fromChatDeleteApi
where permissions are already validated upstream.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/api/files.ts
(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/files.ts (8)
server/shared/types.ts (2)
attachmentFileTypeMap
(210-218)AttachmentMetadata
(707-707)server/shared/fileUtils.ts (2)
getFileType
(10-32)isImageFile
(4-8)server/integrations/dataSource/config.ts (2)
getBaseMimeType
(69-71)isImageFile
(84-87)server/search/vespa.ts (3)
insert
(20-20)DeleteDocument
(24-24)GetDocument
(21-21)server/utils.ts (1)
getErrorMessage
(103-106)server/search/utils.ts (1)
expandSheetIds
(23-42)server/api/dataSource.ts (1)
DeleteImages
(235-261)server/api/search.ts (1)
handleAttachmentDeleteSchema
(259-261)
🔇 Additional comments (6)
server/api/files.ts (6)
255-278
: Image attachments now create Vespa documents with image chunk fields.This change aligns with the PR objective to store attachments in the file schema. Previously, images were only stored on disk. Now they have Vespa metadata records with
image_chunks
andimage_chunks_pos
fields for potential future embedding/search capabilities.However, note the potential issue at line 264 below.
439-448
: Path traversal validation is correctly implemented.The validation prevents directory traversal attacks by checking for
..
,/
, and\
characters in the fileId before constructing file paths.
487-491
: Schema selection handles migration path correctly.The logic correctly routes legacy
att_
attachments toKbItemsSchema
and newattf_
attachments tofileSchema
, supporting the migration from the old schema to the new one.
535-538
: Permission check correctly handles edge cases.The
Array.isArray
check ensures that ifpermissions
is not an array (e.g., malformed data), it safely defaults to an empty array, preventing runtime errors.This addresses the concern raised by gemini-code-assist in the past review.
454-461
: Image deletion now removes Vespa documents. Legacy attachments without Vespa metadata may causeDeleteDocument
to error (404); verify there are no legacy image directories in your production storage path (e.g.$IMAGE_DIR
), or wrapDeleteDocument
in error handling to ignore missing Vespa documents.
527-531
: Add documentation for legacy attachment limitation
This endpoint only deletes new attachments (attf_*
viafileSchema
); legacyatt_*
documents inKbItemsSchema
are removed by the chat‐deletion flow—add a comment here to clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backend is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
server/api/chat/chat.ts (1)
771-785
: Attachments are still never fetched
SelectMessage
rows coming fromgetChatMessagesWithAuth
don’t hydrate anattachments
array, soattachmentsToDelete
remains empty. The earlier review already pointed this out—please fetch viagetAttachmentsByMessageId(tx, message.externalId, email)
inside the loop, collect them, and only callhandleAttachmentDelete
after the transaction.
🧹 Nitpick comments (1)
server/vespa/schemas/file.sd (1)
50-71
: LGTM! Consider clarifying field naming.The new fields for image chunk support are well-structured and consistent with the existing
chunks
fields. The use of BM25 indexing onimage_chunks
is appropriate.Minor observation: The name
image_chunks
suggests these contain text extracted from or describing images (OCR text, captions, etc.) rather than binary image data, which is correct given thearray<string>
type. This is fine, though if you want to be more explicit, names likeimage_text_chunks
could clarify the content type.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
server/api/chat/chat.ts
(6 hunks)server/package.json
(1 hunks)server/vespa/schemas/file.sd
(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- server/package.json
🧰 Additional context used
🧬 Code graph analysis (1)
server/api/chat/chat.ts (4)
server/shared/types.ts (2)
AttachmentEntity
(39-39)AttachmentMetadata
(707-707)server/db/chat.ts (1)
getChatByExternalIdWithAuth
(70-83)server/db/message.ts (1)
getChatMessagesWithAuth
(47-58)server/api/files.ts (1)
handleAttachmentDelete
(409-513)
🪛 GitHub Actions: TypeScript Build Check
server/api/chat/chat.ts
[error] 1973-1973: api/chat/chat.ts(1973,38): Property 'attachmentRank' does not exist on type 'typeof SearchModes'.
🔇 Additional comments (6)
server/vespa/schemas/file.sd (6)
3-7
: LGTM! Well-structured metadata.The
chunk_meta
struct provides appropriate fields for tracking chunk metadata including index, page numbers, and block labels.
117-121
: Verify DIMS configuration at deployment.The
image_chunk_embeddings
field is correctly configured with the same pattern aschunk_embeddings
. The indexing pipeline will automatically embed theimage_chunks
field using the configured embedder.Ensure that the
DIMS
placeholder is properly replaced with the actual embedding dimension at deployment time, and verify that the embedder model handles both regular text chunks and image-extracted text effectively.
196-219
: LGTM! Image-aware scoring functions properly implemented.The
initial_image
rank profile correctly extends the baseinitial
profile with image-specific scoring functions:
vector_score_image()
combines text and image embeddings scorescombined_nativeRank_image()
includes proper zero-division protection- All functions appropriately aggregate scores from both text and image chunks
The implementation follows Vespa ranking best practices.
249-275
: LGTM! Attachment ranking profile correctly configured.The
attachmentRank
profile properly:
- Inherits image-aware scoring from
initial_image
- Uses
vector_score_image
andcombined_nativeRank_image
in both first-phase and global-phase- Applies recency decay via
doc_recency
- Exposes comprehensive match-features including
image_chunk_scores
The ranking expressions are syntactically valid and follow the expected pattern for hybrid search with image support.
441-450
: LGTM! Document summaries properly configured.The new summaries are correctly defined:
image_chunks_summary
has bolding enabled for search result highlighting- Position summaries (
chunks_pos_summary
,image_chunks_pos_summary
) are properly sourcedThese summaries will enable retrieval of image chunk data and position information in query results.
135-141
: Verify fieldset configuration for image chunks.The
image_chunks
field is not included in any fieldset (neitherdefault
norautocomplete
). This means generic queries won't automatically search image chunk content unless the field is explicitly specified.Please confirm this is intentional—typically this would be correct if:
- Attachment/image searches always use the
attachmentRank
profile with explicit field targeting- You want to prevent generic searches from matching image-extracted text
If image chunks should be searchable in general queries, consider adding them to the
default
fieldset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
server/api/chat/agents.ts (2)
873-874
: Add null-safety check for appIntegrations.The type assertion
as string[]
is unsafe becauseagentForDb?.appIntegrations
can benull
orundefined
. CallingflatMap
on a null/undefined value will cause a runtime exception.Apply this diff to add proper null-checking:
-const appIntegrations = agentForDb?.appIntegrations as string[] -const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds) +const appIntegrations = (agentForDb?.appIntegrations as string[]) || [] +const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)Alternatively, use optional chaining:
-const appIntegrations = agentForDb?.appIntegrations as string[] -const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds) +const appIntegrations = agentForDb?.appIntegrations as string[] | undefined +const appIntegrationsWithSheetIds = appIntegrations?.flatMap(expandSheetIds) ?? []
3873-3874
: Add null-safety check for appIntegrations.The type assertion
as string[]
is unsafe becauseagentForDb?.appIntegrations
can benull
orundefined
. CallingflatMap
on a null/undefined value will cause a runtime exception.Apply this diff to add proper null-checking:
-const appIntegrations = agentForDb?.appIntegrations as string[] -const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds) +const appIntegrations = (agentForDb?.appIntegrations as string[]) || [] +const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds)Alternatively, use optional chaining:
-const appIntegrations = agentForDb?.appIntegrations as string[] -const appIntegrationsWithSheetIds = appIntegrations.flatMap(expandSheetIds) +const appIntegrations = agentForDb?.appIntegrations as string[] | undefined +const appIntegrationsWithSheetIds = appIntegrations?.flatMap(expandSheetIds) ?? []
♻️ Duplicate comments (1)
server/api/chat/chat.ts (1)
782-785
: Attachments never get deleted
getChatMessagesWithAuth
returnsSelectMessage
rows that don’t include attachment metadata (we persist it viastoreAttachmentMetadata
). As a resultattachmentsToDelete
stays empty and nothing is passed tohandleAttachmentDelete
, so files survive chat deletion. Fetch the metadata per message inside the transaction (falling back tomessage.attachments
only if present) before pushing intoattachmentsToDelete
.- for (const message of messagesToDelete) { - if (message.attachments && Array.isArray(message.attachments)) { - const attachments = message.attachments as AttachmentMetadata[] - attachmentsToDelete.push(...attachments) - } - } + for (const message of messagesToDelete) { + const attachments = Array.isArray((message as any).attachments) + ? ((message as any).attachments as AttachmentMetadata[]) + : await getAttachmentsByMessageId(tx, message.externalId, email) + + if (attachments.length) { + attachmentsToDelete.push(...attachments) + } + }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/api/chat/agents.ts
(5 hunks)server/api/chat/chat.ts
(9 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
server/api/chat/agents.ts (3)
server/shared/types.ts (2)
AttachmentEntity
(39-39)Apps
(40-40)server/search/utils.ts (1)
expandSheetIds
(23-42)server/search/vespa.ts (1)
getAllDocumentsForAgent
(160-160)
server/api/chat/chat.ts (3)
server/shared/types.ts (2)
AttachmentEntity
(39-39)AttachmentMetadata
(707-707)server/api/files.ts (1)
handleAttachmentDelete
(409-513)server/search/utils.ts (1)
expandSheetIds
(23-42)
🪛 GitHub Actions: TypeScript Build Check
server/api/chat/chat.ts
[error] 1973-1973: Property 'attachmentRank' does not exist on type 'typeof SearchModes'.
🔇 Additional comments (2)
server/api/chat/agents.ts (2)
396-398
: LGTM! Attachment filtering is correct.The logic properly filters out attachment entities from citations by checking against the
AttachmentEntity
enum values.
847-849
: LGTM! Sheet expansion for attachments is correct.The code correctly expands sheet IDs for non-image attachments using
flatMap(expandSheetIds)
, ensuring multi-sheet files are properly handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
server/vespa/schemas/file.sd (1)
262-271
: Consider adding individual image field metrics for completeness.While the combined image-aware metrics are present, adding
nativeRank(image_chunks)
andbm25(image_chunks)
to the match-features list could be helpful for debugging and fine-tuning the ranking behavior of image-based searches.You can apply this diff to add the individual metrics:
match-features { matchedFieldCount_image vector_score_image combined_nativeRank_image nativeRank(title) nativeRank(chunks) + nativeRank(image_chunks) chunk_scores image_chunk_scores + bm25(image_chunks) doc_recency }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/vespa/schemas/file.sd
(5 hunks)
🔇 Additional comments (6)
server/vespa/schemas/file.sd (6)
3-7
: LGTM!The
chunk_meta
struct is well-defined with appropriate types for tracking chunk metadata including index, page numbers, and block labels.
50-71
: LGTM!The new document fields for image chunk support are properly defined with appropriate types and indexing directives. The symmetric structure between text chunks and image chunks is clean and maintainable.
117-121
: LGTM!The
image_chunk_embeddings
field is correctly configured with the appropriate tensor type, indexing pipeline, distance metric, and memory optimization. It's consistent with the existingchunk_embeddings
field structure.
223-272
: LGTM! Past issues resolved.The
attachmentRank
profile is correctly implemented with image-aware scoring functions that properly combine text and image signals. Both first-phase and global-phase expressions use the appropriate_image
variants, and match-features include all relevant image metrics. The previously reported issues with malformed expressions and missing image signals have been successfully resolved.
135-137
: Verify ifimage_chunks
should be included in the default fieldset.The
default
fieldset excludesimage_chunks
, meaning queries using this fieldset won't search image content unless explicitly specified. This might be intentional if image search requires explicit selection of theattachmentRank
profile, but please confirm this design choice aligns with your search requirements.
438-447
: LGTM!The document summaries for image chunks and positions are properly defined with appropriate bolding settings and source references, consistent with the existing text chunk summaries.
Description
now attachments will be ingested into file schema instead of kb_items schema
Summary by CodeRabbit
New Features
Bug Fixes
Chores